Is the ICH E9 estimand addendum compatible with ‘model-based’ estimands?

Acknowledgements

This work is supported by the UK Medical Research Council (MR/T023953/1).

Model-based estimands and the E9 addendum

Model-based estimands

Clinical trials are often analysed using a regression model.

By a model-based estimand, I mean the coefficient of treatment in such a model.

E.g. coefficient of treatment in a linear (continuous), logistic (binary), negative binomial (count) or Cox (time-to-event) regression model.

Typically the model also adjusts for additional baseline covariates.

Are such estimands/analyses compatible with the ICH E9 estimand addendum?

ICH E9 estimand addendum

The ICH E9 estimand addendum gives 4 attributes for an estimand:

Treatment
Population
Variable or endpoint
Population-level summary

The addendum is not explicit about what ‘population-level summary’ can consist of.

But it suggests summary level statistics, e.g. mean, median, proportion, probability.

A parameter in a parametric or semiparametric statistical model (i.e. model-based estimand) does not seem to fit the bill.

Estimand before estimator

The E9 addendum emphasizes ‘Trial planning should proceed in sequence’:

Trial objective
Estimand
Main estimator

This also implies the estimand should be defined without reference to a particular statistical estimator or model.

Again, this points to the E9 addendum not being compatible with model-based estimands.

Common model-based estimands - do they correspond to population-level summaries?

Model-based estimands - which are population-level summaries?

For some models, the model-based estimand fortunately coincides with a population-level summary estimand, with the latter being defined without reference to the model.

Next, let’s consider in turn the four model types mentioned earlier:

linear regression model for continuous outcomes
logistic regression for binary outcomes
negative binomial regression for count outcomes
Cox regression for time to event outcomes

Linear regression (1)

\[Y_i=\beta_0+\beta_{TRT} TRT_i + \beta_{X} X_i + \epsilon_i\]

where \(Y_i\) is outcome, \(TRT_i\) binary treatment indicator, \(X_i\) a baseline covariate.

Ordinarily, we would interpret \(\beta_{TRT}\) as a conditional effect - the treatment increases the mean of \(Y\) by \(\beta_{TRT}\) in each subgroup of the population defined by levels of \(X\).

In fact, because of randomisation, it can be shown (Tsiatis 2008) that \(\beta_{TRT}\) is also a marginal estimand: \[\beta_{TRT}=E(Y^1)-E(Y^0)\]

\(E(Y^1)-E(Y^0)\) is a population-level summary measure - its definition is not predicated on any particular statistical model.

Alternatively, \(E(Y^1)-E(Y^0)\) is a model-free estimand.

Linear regression (2)

In fact, the linear regression estimator \(\hat{\beta}_{TRT}\) is unbiased for \(E(Y^1)-E(Y^0)\) even if the model is misspecified in some way (Wang et al 2019).

Moreover, the usual ‘model-based’ standard errors are valid provided randomisation is 1-1 (Wang et al 2019).

If not, a sandwich variance estimator is needed (Bartlett 2020).

Logistic regression for binary \(Y\) (1)

\[\text{logit}(P(Y_i=1))=\beta_0+\beta_{TRT} TRT_i + \beta_{X} X_i\]

Unfortunately, the same nice properties do not hold for logistic regression.

Due to non-collapsibility of the odds ratio, while \(\beta_{TRT}\) is a conditional estimand, it is not a marginal estimand.

So even if the model is correct, \(\beta_{TRT}\) is not a population level measure, but a measure of effect in subgroups of the population.

Logistic regression for binary \(Y\) (2)

If the logistic model is not correct (e.g. if odds ratio for treatment varies by level of \(X\)), there is no single ‘conditional effect’.

Because of the risk of model misspecification, the recent FDA covariate adjustment guidance states

When estimating a conditional treatment effect through nonlinear regression, the model assumptions will generally not be exactly correct, and results can be difficult to interpret if the model is misspecified and treatment effects substantially differ across subgroups.

Such concerns speak to the advantages of using model-free estimands.

Negative binomial regression for count \(Y\)

Suppose the trial plans to follow patients up to time \(\tau\), and \(Y_i\) denotes event count over this follow-up.

Negative binomial regression then assumes

\[\text{log}(E(Y_i))=\beta_0+\beta_{TRT} TRT_i + \beta_{X} X_i + \log(T_i)\]

where \(T_i\) is time patient \(i\) was followed up for.

\(\beta_{TRT}\) is a conditional (on \(X_i\)) rate ratio estimand, assuming the model is correct.

Population-level summaries for count data

A population-level summary measure for count data is \[\Delta = \frac{E(Y^{1}(\tau))}{E(Y^{0}(\tau))}\] where \(Y^{a}(t)\) is the number of events a patient experiences up to time \(t\) if assigned to treatment \(a\).

Unlike odds ratios, rate ratios are collapsible.

If the neg. bin. model is correctly specified, and any dropout is at random, \(\hat{\beta}_{TRT}\) is unbiased for \(\Delta\).

If the model is not correctly specified, the estimator may be biased.

Robust inference for rate ratio \(\Delta\)

At least if follow-up is complete for all patients, unlike negative binomial regression, Rosenblum and van der Laan (2010) showed Poisson regression yields

an unbiased estimator of \(\Delta\)
type 1 error control

even if the model is misspecified, provided one uses sandwich standard errors.

Cox regression

Suppose now \(Y\) is a time-to-event outcome.

The Cox model with only treatment as covariate assumes proportional hazards for treatment effect.

No model-free population summary measure corresponds to the quantity targeted by the Cox model.

With additional covariates, we have additionally, as per the logistic case, non-collapsibility of the hazard ratio.

Model-based vs. model-free estimands

Arguments for and against model-based estimands

For

Long established familiarity and acceptance among various stakeholders.
When the modelling assumptions hold, the model offers a (relatively) simple description of how the treatment and other covariates influence outcome.
When they are effects conditional on prognostic baseline covariates, some argue they are more patient relevant and more transportable outside of the trial population (Harrell 2021).

Against

Model assumptions may not hold, and in advance of seeing the data, we can’t tell if they will.
If the assumptions do not hold, estimate and inference is (arguably) difficult to interpret

Traditional (parametric) model-based statistical inference

Reproduced with permission from ‘Targeted Learning’ by van der Laan and Rose, 2011, Springer

Arguments in favour of model-free effect estimands in trials

Clear separation between
- specification of scientific question / target of inference
- statistical estimation methods and assumptions.
Their meaning and value is not contingent on validity of statistical assumptions, which may or may not turn out to hold.
At least if data are complete, as a consequence of randomisation we can use estimators which are guaranteed to be unbiased (i.e. unbiasedness is not contingent on statistical modelling assumptions holding).

Conclusions and implications

Conclusions

Defining estimands within a (semi)parametric model is not compatible with the E9 estimand addendum’s requirements:
- for the estimand to be a population summary measure,
- to define the estimand before the statistical estimator.
There are strong arguments in favour of use of model-free estimands in trials, and indeed more generally in statistics (Vansteelandt and Duke 2022).

Implications

If we were to only use model-free estimands in trials, what would this mean?

For binary outcomes, no more use of conditional odds ratios for treatment effect from logistic regression.
Instead, use of other measures such as marginal odds ratio, risk ratio, risk difference.
Important to note these can be estimated exploiting baseline covariates for improved precision (see FDA covariate adjustment guidance).
For time-to-event outcomes, hazard ratios no longer used.
Instead, use alternatives such as RMST and differences in survival probability at landmark times.